Method and system for managing a distributed network of network monitoring devices

ABSTRACT

Network traffic information for nodes of a first logical hierarchy is stored at a monitoring device according to ranks of the nodes within the logical hierarchy as determined by each node&#39;s position therein and user preferences. At least some of the network traffic information stored at the network monitoring device may then be reported to another network monitoring device, where it can be aggregated with similar information from other network monitoring devices. Such reporting may occur according to rankings of inter-node communication links between nodes of different logical hierarchies of monitored nodes.

This application is a Continuation of U.S. patent application Ser. No.11/092,226, filed 28 Mar. 2005.

FIELD OF THE INVENTION

The present invention relates to: (a) the management of data stored byindividual network monitoring devices, for example where such networkmonitoring devices are configured to store network traffic informationrelating to logical groupings of network nodes, and (b) managing datastored by such network monitoring devices when arranged in a distributednetwork of their own (i.e., a network monitoring device network).

BACKGROUND

Today, information technology professionals often encounter a myriad ofdifferent problems and challenges during the operation of a computernetwork or network of networks. For example, these individuals mustoften cope with network device failures and/or software applicationerrors brought about by such things as configuration errors or othercauses. Tracking down the sources of such problems often involvesanalyzing network and device data collected by monitoring units deployedat various locations throughout the network.

Traditional network monitoring solutions group network traffic accordingto whether a network node is a “client” or a “server”. More advancedprocesses, such as those described in co-pending patent application Ser.No. 10/937,986, filed Sep. 10, 2004, assigned to the assignee of thepresent invention and incorporated herein by reference, allow forgrouping data by the role being played by a network node and/or bylogical units (business units) constructed by network operators for thepurpose of monitoring and diagnosing network problems. These forms ofadvanced monitoring techniques can yield very good results in terms ofproviding operators with information needed to quickly diagnose and/orsolve problems.

With these advanced forms of network monitoring, however, come problems.For example, collecting and storing data for all logical groupings ofnodes and inter-nodal communications paths in a network quickly becomesunmanageable as that network grows in size. Consequently, what areneeded are methods and systems to facilitate centralized networkmonitoring for large, distributed networks.

SUMMARY OF THE INVENTION

In one embodiment of the present invention network traffic informationfor those of a first logical hierarchy of monitored network nodes whichcan be accommodated by a first network monitoring device is storedaccording to ranks of the monitored network nodes within the logicalhierarchy as determined by a node's position therein and userpreferences. At least some of the network traffic information stored atthe first network monitoring device may then be reported from the firstnetwork monitoring device to a second network monitoring device of thenetwork monitoring device network, e.g., acting as a centralized networkmonitoring device. For example, the second network monitoring device mayreceive that portion of the network traffic information stored at thefirst network monitoring device according to rankings of inter-nodecommunication links between nodes of the first logical hierarchy ofmonitored network nodes of the first network monitoring device andothers nodes of a second logical hierarchy of monitored network nodes ofa third network monitoring device of the network monitoring devicenetwork. Such rankings of inter-node communication links may bedetermined according to ranks of individual nodes associated with thecommunication links within corresponding ones of the first and secondlogical hierarchies of nodes, each such rank being determined accordingto a first distance measured from a root node of a hierarchy underconsideration to a node under consideration, a second distance measuredfrom a leaf node of the hierarchy under consideration to the node underconsideration and user preferences. Also, the ranks of the monitorednetwork nodes within the first logical hierarchy of nodes of the firstnetwork monitoring device may be determined according to a firstdistance measured from a root node of the hierarchy to a node underconsideration and a second distance measured from a leaf node of thehierarchy to the node under consideration and user preferences.

In further embodiments of the present invention, nodes of a grouping ofnodes within a network are ranked, at a first network monitoring device,according to each node's position within a logical hierarchy of thenodes of the grouping and user preferences; and network traffic dataassociated with the nodes of the grouping of nodes is stored or notstored according to each node's rank as so determined. Thereafter, atleast some of the network traffic data stored according to each node'srank may be transferred from the first network monitoring device to asecond network monitoring device, for example if said rank satisfiesadditional ranking criteria concerning communications between nodes ofdifferent groupings.

Yet another embodiment of the present invention allows for aggregating,at a network monitoring device, network traffic information forinter-node communications between nodes of different logical groupingsof nodes, said logical groupings of nodes including groupings defined interms of other logical groupings of nodes, according to ranks ofindividual nodes within each of the different logical groupingsassociated with the inter-node communications, each such rank beingdetermined according to a first distance measured from a root node of alogical hierarchy of a one of the logical groupings of nodes underconsideration to a node thereof under consideration, a second distancemeasured from a leaf node of the logical hierarchy of the one of thelogical groupings under consideration to the node under considerationand user preferences. Such aggregating may proceed incrementally foreach branch of a logical group-to-logical group hierarchy constructed bythe network monitoring device.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example, and notlimitation, in the figures of the accompanying drawings in which:

FIG. 1 illustrates an example of a computer network and its associatednetwork monitoring device;

FIG. 2 illustrates an example of a network of network monitoring devicesdeployed in accordance with an embodiment of the present invention; and

FIG. 3 illustrates an example of a BGO hierarchy in accordance with anembodiment of the present invention.

DETAILED DESCRIPTION

Described herein are methods and systems to facilitate networkmonitoring for computer networks. The present invention encompasses boththe management of data stored by individual network monitoring devices,for example where such network monitoring devices are configured tostore network traffic information relating to logical groupings ofnetwork nodes, and data stored by such network monitoring devices whenarranged in a distributed network of their own (i.e., a networkmonitoring device network). First to be described will be the managementof data stored by individual network monitoring devices. Thereafter,techniques for aggregating and managing the storage of such data among anetwork of monitoring devices will be presented.

For the first case, managing data stored by individual monitoringdevices, consider that for large, distributed networks there may existmany (potentially hundreds or even thousands) of node-to-nodecommunication links. Here we refer not strictly to the physicalinter-node communication links (e.g., optical fibers, copper wires, andthe like), but rather to the virtual or logical node-to-node connectionsthat permit communication between two or more nodes in one or morecomputer networks. Because of constraints on the amount of physicalmemory available to a network monitoring device, it becomes at the veryleast impractical (and quickly impossible) to collect and store networktraffic data for all of these multiple inter-node communication linksfor a network of any appreciable size. Indeed, the situation becomeseven worse (from a need for storage viewpoint) if the nodes are groupedin some fashion, for now one must consider not only the individualnode-to-node communications but also the group-to-group communications,which themselves may exist at multiple levels. Consequently, decisionsabout what data and/or which nodes/communication links should bemonitored must be made (to meet capacity limits of the networkmonitoring devices); all the while remembering that network operatorswill still require sufficient information regarding network trafficconditions in order to make informed decisions regarding networkoperations and control.

In a first aspect of the present invention, these needs are addressed bya methodology for determining which nodes/links (i.e., the networktraffic data associated with such monitored nodes and/or links) to trackin one or a set of monitoring devices, to ensure data integrity. In thisprocedure, each network monitoring device collects data for designatednodes/communication links in a computer network or network of networks.Where necessary, the nodes/links are ranked and decisions are made basedon such rankings if it is necessary to discard data relating to anymonitored nodes/links in order not to exceed storage and/or processingcapacity of a network monitoring device. As will be more fully discussedbelow, in one embodiment such ranking is a function of an individualnode's distance from a root and/or leaf position within a logicalhierarchy describing the arrangement of the nodes of the subject networkas well as other factors (e.g., user preferences).

Then, for the second case of managing the aggregation and storage ofmonitored data among a network of monitoring devices, we introduce theconcept of “Appliances” and a “Director”. As used herein, the termAppliance will be applied to those network monitoring devices assignedto collect network traffic data from designated nodes/links of one ormore networks. The Director will be a central network monitoring deviceto which the Appliances send specified information concerning designatedones of the monitored nodes/links. Together, the Director and theAppliances form a network of network monitoring devices.

Just as the individual network monitoring devices (the Appliances) werelimited in their ability to store network traffic data concerning all ofthe myriad inter-node communication links, so too is the Directorlimited in its ability to store network traffic data received from theAppliances. Hence, the present invention further encompasses techniquesfor making decisions about which data concerning the monitorednodes/links to pass from the Appliances to the Director. As was the casefor the individual Appliances, such decisions involve rankings ofnodes/links. In this way, network operators using the Directormonitoring device may readily gain access to network diagnosticinformation at a single monitoring device while at the same time thatmonitoring device is not overwhelmed with information concerning thenumerous network nodes and communication links.

As will become apparent, the ability to group various networknodes/links into logical units and to further group these logical unitsinto higher layer units provides for many of the advantages of thepresent methods and will be discussed before presenting details of thevarious ranking algorithms used in connection with the presentinvention. Before doing so, however, it is important to remember thatfor purposes of explanation numerous specific details are set forthherein in order to provide a thorough understanding of the invention.However, it will be appreciated by one with ordinary skill in the artthat these specific details need not be used to practice the presentinvention. In other instances, well-known structures and devices areshown in block diagram form in order to avoid unnecessarily obscuringthe present invention.

The methods described herein may be used in conjunction with othertechniques to allow network operators to detect problems and/or discoverrelevant information with respect to network applicationusage/performance and then isolate the problem/information to specificcontributors (e.g., users, applications or network resources). Moreparticularly, the present methods involve computations and analysesregarding many variables and are best performed or embodied ascomputer-implemented processes or methods (a.k.a. computer programs orroutines) that may be rendered in any computer programming languageincluding, without limitation, C#, C/C++, Fortran, COBOL, PASCAL,assembly language, markup languages (e.g., HTML, SGML, XML, VoXML), andthe like, as well as object-oriented languages/environments such as theCommon Object Request Broker Architecture (CORBA), Java™ and the like.In general, however, all of the aforementioned terms as used herein aremeant to encompass any series of logical steps performed in a sequenceto accomplish a given purpose.

In view of the above, it should be appreciated that some portions ofthis detailed description of the present invention are presented interms of algorithms and symbolic representations of operations on datawithin a computer memory. These algorithmic descriptions andrepresentations are the means used by those skilled in the computerscience arts to most effectively convey the substance of their work toothers skilled in the art. An algorithm is here, and generally,conceived to be a self-consistent sequence of steps leading to a desiredresult. The steps are those requiring physical manipulations of physicalquantities. Usually, though not necessarily, these quantities take theform of electrical or magnetic signals capable of being stored,transferred, combined, compared and otherwise manipulated. It has provenconvenient at times, principally for reasons of common usage, to referto these signals as bits, values, elements, symbols, characters, terms,numbers or the like. It should be borne in mind, however, that all ofthese and similar terms are to be associated with the appropriatephysical quantities and are merely convenient labels applied to thesequantities. Unless specifically stated otherwise, it will be appreciatedthat throughout the description of the present invention, use of termssuch as “processing”, “computing”, “calculating”, “determining”,“displaying” or the like, refer to the action and processes of acomputer system, or similar electronic computing device, thatmanipulates and transforms data represented as physical (electronic)quantities within the computer system's registers and memories intoother data similarly represented as physical quantities within thecomputer system memories or registers or other such information storage,transmission or display devices.

The present invention can be implemented with an apparatus to performthe operations described herein. This apparatus may be speciallyconstructed for the required purposes, or it may comprise ageneral-purpose computer, selectively activated or reconfigured by acomputer program stored in the computer. Such a computer program may bestored in a computer readable storage medium, such as, but not limitedto, any type of disk including floppy disks, optical disks, CD-ROMs, andmagnetic-optical disks, read-only memories (ROMs), random accessmemories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any typeof media suitable for storing electronic instructions, and each coupledto a computer system bus.

The algorithms and processes presented herein are not inherently relatedto any particular computer or other apparatus. Various general-purposesystems may be used with programs in accordance with the teachingsherein, or it may prove convenient to construct more specializedapparatus to perform the required method. For example, any of themethods according to the present invention can be implemented inhard-wired circuitry, by programming a general-purpose processor or byany combination of hardware and software. One of ordinary skill in theart will immediately appreciate that the invention can be practiced withcomputer system configurations other than those described below,including hand-held devices, multiprocessor systems,microprocessor-based or programmable consumer electronics, DSP devices,network PCs, minicomputers, mainframe computers, and the like. Theinvention can also be practiced in distributed computing environmentswhere tasks are performed by remote processing devices that are linkedthrough a communications network. The required structure for a varietyof these systems will appear from the description below.

The methods of the present invention may be implemented using computersoftware. If written in a programming language conforming to arecognized standard, sequences of instructions designed to implement themethods can be compiled for execution on a variety of hardware platformsand for interface to a variety of operating systems. In addition, thepresent invention is not described with reference to any particularprogramming language. It will be appreciated that a variety ofprogramming languages may be used to implement the teachings of theinvention as described herein. Furthermore, it is common in the art tospeak of software, in one form or another (e.g., program, procedure,application, etc.), as taking an action or causing a result. Suchexpressions are merely a shorthand way of saying that execution of thesoftware by a computer causes the processor of the computer to performan action or produce a result.

As indicated above, before describing the various ranking processes usedas a basis for determining which information to store at a networkmonitoring device, the manner in which such information is collectedwill first be discussed. Turning then to FIG. 1, a computer networkincluding multiple logical groupings (e.g., BG1, BG2) of network nodesis illustrated. Logical groupings such as BG1 and BG2 may be defined atany level. For example, they may mirror business groups, or maydesignate computers (or other nodes, e.g., printers, servers, imageprocessors, scanners, or other computer equipment generally addressablewithin a computer network) performing similar functions, computerslocated within the same building, or any other aspect which a user ornetwork operator/manager wishes to highlight. FIG. 1 shows one simpleorganization of a small number of computers and other network nodes, butthose familiar with computer network operations/management willappreciate that the number of computers and network nodes may besignificantly larger as can the number of connections (communicationlinks) between them. Modem network configurations are mutable andcomplex, which is one of the reasons why the present invention isuseful. Information representing the total utilization of all nodes inparticular directions or activities provides much greater visibilityinto overall network traffic than does a large collection ofindividualized node information. The present invention allows for thegrouping of network traffic into logical groups and groups of logicalgroups that a user can configure in order to allow visibility of networktraffic at various hierarchical levels.

In FIG. 1 lines between nodes and other entities are meant to indicatenetwork communication links, which may be any mode of establishing aconnection between nodes including wired and/or wireless connections.Moreover, a firewall (shown as the dashed line) surrounds a geographiccollection of networked nodes and separates components of an internalnetwork from an external network 6. A network traffic monitoring device8 is shown at the firewall. However, the network traffic monitoringdevice 8 may be located within the internal network, or the externalnetwork 6 or anywhere that allows for the collection of network trafficinformation. Moreover, network traffic monitoring device 8 need not be“inline.” That is, traffic need not necessarily pass through networktraffic monitoring device 8 in order to pass from one network node toanother. The network traffic monitoring device 8 can be a passivemonitoring device, e.g., spanning a switch or router, whereby all thetraffic is copied to a switch span port which passes traffic to networktraffic monitoring device 8.

In the example shown in FIG. 1, BG1 contains several internal networknodes N101, N102, N103, and N104 and external nodes N105, N106 and N107.Similarly, BG2 contains several internal network nodes N201, N202, N203,N204, N205, N206, and external nodes N207, N208, N209, N210 and N211. Anetwork node may be any computer or device on the network thatcommunicates with other computers or devices on the network. Each nodemay function as a client, server, or both. For example, node N103, isshown as a database which is connected to Node N104, a web applicationserver, via a network link 10. In this configuration, it is typical fornode N104 to function as a client of node N103 by requesting databaseresults. However N104 is also depicted as connected to the externalnetwork 6 via network link 12. In this configuration, it is typical forN104 to function as a server, which returns results in response torequests from the external network. Similarly, database node N103, whichfunctions as a server to N104, is shown connected to node N107 via anetwork link 14. N107 may upload information to the database via link14, whereby N107 is functioning as a server and N103 is functioning as aclient. However, N107 is also shown connected to the external network 6via link 16. This link could indicate that N107 is browsing the Internetand functioning as a client.

Furthermore, network nodes need not be within the internal network inorder to belong to a logical group. For example, traveling employees mayconnect to the logical group network via a virtual private network (VPN)or via ordinary network transport protocols through an external networksuch as the Internet. As shown in FIG. 1, network nodes N105, N106, andN107 belong to logical group BG1, but are outside the firewall, and maybe geographically distant from the other network nodes in BG1.Similarly, network nodes N207, N208, N209, N210, and N211 are members oflogical group BG2, but are physically removed from the other members ofgroup BG2. It is important to note that the firewall in thisconfiguration is for illustrative purposes only and is not a requiredelement in networks where the present invention may be practiced. Theseparation between internal and external nodes of a network may beformed by geographic distance (as described above), or by networkingpaths (that may be disparate or require many hops for the nodes toconnect to one another regardless of their geographic proximity).

For a relatively small network such as that shown in FIG. 1, a singlenetwork monitoring device 8 may suffice to collect and store networktraffic data for all nodes and communication links of interest. However,for a network of any appreciable size (or for a network of networks),this will likely not be the case. Thus decisions about what data tostore for which nodes/groups of nodes and/or links/groups of links needto be made.

To further illustrate this point, consider a network monitoring devicelocated in a data center (call it the New York (NY) datacenter) thatmonitors traffic between the New York office of an enterprise and itsremote branch offices around the world. The NY enterprise and each ofthe branch offices may be organized with multiple logical groups ofnodes. We will call each such logical group a “business group” or BG,however, it should be recognized that BGs could be created along any ofthe lines discussed above (e.g., any user-desired definition). Indeed,some of the BGs may themselves include other BGs, forming what will betermed herein business group organizations or BGOs. See, for example,FIG. 3 in which a BGO called “California” (CA) includes multiple BGs:“San Francisco” (SF), “Los Angeles” (LA), and “San Diego” (SD), each ofwhich may themselves be made up of other BGs (e.g., the Los Angeles BGmay include BGs for Santa Monica (SM), Riverside (R) and Orange County(OC)) and/or nodes. Thus, the various BGs may be grouped in varioushierarchies, within which there may be many node-to-node andgroup-to-group (BG-to-BG and/or BGO-to-BGO) connections (representinginter-group communications).

In addition to the above, for each communication link underconsideration there are a host of various metrics that might becollected by a network monitoring device. Among these are: Goodput,Payload, Throughput, Transaction Throughput, Packet Loss, RetransmissionDelay, Retransmission Rate and Round Trip Time, Application ResponseRate, Application Response Time, Client Reset Rate, Connection Duration,Connection Established Rate, Connection Request Rate, Connection SetupTime, Connections Failed Rate, Data Transfer Time, Server Reset Rate andTime to First Byte. These metrics can be further subdivided on the basisof the role being played by the content originator and the contentrequester. Thus, as this exercise should make clear, for networks (ornetworks of networks) of any appreciable size there are far too manydata points for a single monitoring device to cope with. That is, asingle device cannot reasonably store data concerning all of the variouscommunication links within such a network (e.g., due to limits onphysical storage devices, bandwidth utilization, etc.) and so decisionsabout what data to store and what data not to store at the monitoringdevice need to be made.

The solution provided by the present invention allows for suchdecision-making In one embodiment of the invention, a network monitoringdevice consults user-supplied definitions of the BGs/BGOs for which itis responsible and “builds” a BG/BGO hierarchy. The definitions of theBGs/BGOs may be stored locally at a network monitoring device or may bestored in a central location to which a network monitoring device hasaccess. These definitions comprise configuration files that includeuser-supplied instructions regarding the BGs and BGOs to be monitoredand will, generally, define the types of statistics or metrics for whichdata is to be collected and the organization of the network nodes. Theprecise details of such instructions are not critical to the presentinvention and in some cases such instructions may be provided bymanually configuring a network and its associated monitoring devices ona port-by-port level. What is important is that the network monitoringdevice has some means of determining which nodes it is responsible formonitoring.

With the BGO information made available, a network monitoring deviceconstructs its relevant BGO hierarchy. In doing so, the networkmonitoring device considers only those links which are active; that is,those links which have active communications taking place. The BGOhierarchy may be regarded as a “tree-like” structure having a root node,branches and leaf nodes. The “root” node represents the highesthierarchical level in the BGO, while the “leaf” nodes represent thelowest such hierarchical levels (e.g., individual computer resources).“Branch” nodes may be nodes interconnecting leaf nodes to the root nodeand there may be any number of such branch nodes (including none)between the root node and any of the leaf nodes. A branch hierarchy isconstructed by combining the network data collected for each of the leafnodes within a branch and storing that combination with reference to thecommon branching node from which the leaf nodes depend. For each branchof the BGO hierarchy, and on a branch-by-branch basis (starting with theleaf nodes thereof), decisions are made about whether or not to storethe monitored data for those nodes/links

It should be apparent that in the process of constructing such ahierarchy, where each higher layer includes combined statistics fromlower layers, for a hierarchy of any significant depth it may not bepossible to store all of the raw data and the combinations thereof forevery level of the hierarchical tree. Stated differently, storage and/orprocessing capabilities of the network monitoring device may demand thatsome of the data concerning some of the leaf nodes and/or branchingnodes of the BGO hierarchy be intentionally dropped.

To accommodate this reality, the present invention provides for rankingand pruning the BGO hierarchy as it is being constructed by the networkmonitoring device. Importantly, this ranking and pruning process can beperformed without the need for the network monitoring device to storedata for each node of the entire BGO structure. Thus, the BGO hierarchycan be constructed “on-the-fly”, with each branch being pruned as neededduring that process so as to accommodate the network monitoring device'sstorage and/or processing limitations.

The ranking algorithm used by the network monitoring device as itconstructs each branch of the BGO hierarchy may be any such process aspermits the above-described operations. In one embodiment, the algorithmused is:

R _(composite) =F _(devices)(a)+F _(rank)(r)+F _(depth)(d)  (1)

where R_(composite) is a composite ranking of the node/link underconsideration.

In equation (1), F_(devices)(a) is a constant that is proportional to a,the number of monitoring devices designated to have their data for thisassociated BG or BGO (henceforth referred to as a “node”) aggregatedonto a central monitoring device (the Director) as discussed in furtherdetail below. This F_(devices)(a) constant is intended to giveprioritization of the highest ranking to those nodes that need to havetheir data aggregated to the central monitoring device. Put differently,the F_(devices)(a) factor ensures that nodes for which there is at leastone monitoring device contributing to the group (i.e., a>0) receive thehighest ranking. In one embodiment of the present invention,F_(devices)(a) is a monotonically increasing function of a, to ensurethat preference is given to nodes with higher a value.

The F_(rank)(r) term is a function whose value monotonically decreaseswith increasing values of r, the distance (measured in the number ofnodes) between the associated root or “top” node of the hierarchy andthe node associated with R_(composite) (e.g., the number of hops withinthe BGO hierarchical tree). The F_(rank)(r) term is intended to givepreference to nodes that are higher in the BGO tree hierarchy.

The F_(depth)(d) term is a function whose value monotonically increaseswith increasing values of d, the distance (measured in the number ofnodes) between the leaf or “bottom” node of the hierarchy and the nodeassociated with R_(composite). The F_(depth)(d) term is intended forscenarios where there are nodes with the same values for F_(devices)(a)and F_(rank)(r), to give preference to those nodes that have “deeper”tree hierarchies, as reflected by the value of d. In one embodiment ofthe present invention the relative magnitudes of the three termsF_(devices)(a), F_(rank)(r) and F_(depth)(d) may be expressed asF_(devices)(a)>>F_(rank)(r)>>F_(depth)(d), for expected values of a=(0to ˜100), “r”=(1 to ˜20) and “d”=(1 to ˜20).

The rank (R_(composite)) of each node/link is recorded in a databasemaintained by the network monitoring device. Thereafter, as each branchof the hierarchy is constructed, nodes/links may be pruned (i.e.,decisions may be made to drop data collected for such nodes/links)according to such ranks and the storage/processing capacity of thenetwork monitoring device. Alternatively, or in addition, decisionsabout pruning may be based on thresholds for the number of nodes forwhich to store data as configured by a user.

The foregoing has thus addressed the need to determine which nodes/links(i.e., the network traffic data associated with such monitored nodesand/or links) to track in an individual network monitoring device. Thediscussion now turns to the second aspect of the present invention: thecase of managing the aggregation and storage of such monitored dataamong a network of monitoring devices. In this discussion we refer todifferent types of network monitoring devices, namely Appliances and acentral Director.

Earlier it was noted that the network monitoring device 8 illustrated inFIG. 1 may be capable of storing all relevant network trafficinformation for a relatively small network. However, when the networkbecame large, this was no longer true and so decisions had to be madeabout what data to store and what data not to store. Consider now thecase where not only is the network (or network of networks) underconsideration large, but also where more than a single networkmonitoring device is used.

Returning to the earlier example, such a situation may arise where, forexample, in addition to the NY datacenter, an additional datacenter islocated in California (CA). Just like NY, the datacenter in CAsends/receives traffic to/from the same set of remote branch officesdistributed throughout the world. However, the monitoring device (callit Appliance 1) in NY does not “see” any of this data being transferredthrough the CA datacenter. That is, Appliance 1 does not capture thetraffic to/from the CA datacenter. Therefore, a separate monitoringdevice (Appliance 2) is deployed in the CA datacenter to monitor trafficto/from that datacenter.

But now if a network operator wants to assess the total traffic betweenthe London branch office and each of the NY and CA datacenters, thensomehow the information collected by each of the Appliances must beaggregated. In accordance with the present invention, this aggregationis performed at a Director—a central network monitoring device.Collectively, the Director and the various Appliances make up a networkof network monitoring devices and FIG. 2 illustrates an example thereof.

Within network 20, central network monitoring device 22 receives andaggregates network traffic information from two individual networkmonitoring devices 24 _(a) and 24 _(b). Monitoring device 24 _(a) isresponsible for collecting network traffic data associated with a firstnetwork 26 _(a). Monitoring device 24 _(b) is responsible for collectingnetwork traffic data associated with a second network 26 _(b). Networks26 _(a) and 26 _(b) may each include multiple nodes, interconnected withone another and/or with nodes in the other respective network by amyriad of communication links, which may include direct communicationlinks or indirect communication links (e.g., which traverse othernetworks not shown in this illustration). Thus, each of the networkmonitoring devices 24 _(a) and 24 _(b) may be responsible for collectingdata concerning multiple groupings (logical and/or physical) of nodes intheir associated networks 26 _(a) and 26 _(b). That is, the networkoperator may, for convenience, define multiple logical and/or physicalgroupings of nodes in each of the networks 26 _(a) and 26 _(b) andconfigure the respective network monitoring devices 24 _(a) and 24 _(b)to store and track network traffic information accordingly. The totalnumber of monitored nodes/links may be quite large.

Such a network of network monitoring devices poses several challenges.For example, if the network traffic information associated with thevarious BG/BGO-to-BG/BGO communications sought by the Director exceedsthe storage capacity of the Director, what information for whichgroup-to-group communications should be kept? Also, in order not tooverwhelm the available bandwidth within the network of networkmonitoring devices, how can the volume of information being sent betweenthe Appliances and the Director be kept manageable? Finally, how can oneensure completeness (i.e., integrity) of the information for the variousaggregations being performed? For example, if an operator wants all thetraffic between London and the two datacenters aggregated, how can theoperator be certain that each Appliance has stored traffic between itsdatacenter and London if each of the Appliances is pruning the number ofnodes/links for which it stores traffic in accordance with theabove-described processes? The present invention addresses these issuesby employing a global ranking process somewhat similar to that discussedabove with reference to a single network monitoring device.

Once the individual BGO hierarchies have been constructed by theAppliances, decisions about which data to transfer to the Director canbe made. Because the Director will also have limits on the amount ofdata which it can store/process, a ranking algorithm, which may includea bias for ensuring that any nodes/links which the network operator hasindicated should be tracked at this level are always included, fordetermining what data to store and what data not to store is used. Oneexample of such a ranking algorithm used to select the links to betransferred to the Director is:

R′ _(composite) =F _(devices)(max(a ₁ ,a ₂))+F _(rank)(r ₁ ,r ₂)+F_(depth)(r ₁ ,r ₂)  (2)

where “r”, “d” and “a” denote the same metrics as above and thesubscripts 1 and 2 indicate the values associated with the differentBGs/BGOs which the link under consideration interconnects. For example,in a CA-to-NY BGO-to-BGO example, subscript 1 might designate a nodewithin the CA Appliance hierarchy and subscript 2 might designate a nodewithin the NY Appliance hierarchy.

Thus, based on the rankings of the BGO-to-BGO hierarchical trees on thedistributed edge monitoring devices (i.e., the Appliances), a centralnetwork monitoring device (the Director) can construct a compositeBGO-to-BGO hierarchy encompassing the traffic seen by all thedistributed edge monitoring devices. Indeed, this process can berepeated for multiple level network monitoring device hierarchies, whicheach monitoring device at successively higher layers of the hierarchyreceiving data from lower layer devices and pruning BGO-to-BGOhierarchical trees accordingly.

Importantly, the ranking and pruning processes described herein may beimplemented at network monitoring devices at any level within a networkof network monitoring devices (e.g., one in which first layer Appliancesreport up to second layer Appliances, which in turn report up to higherlayer Appliances, until finally reports are made to a Director). Thatis, network monitoring devices at any point within a network of suchdevices may employ such methods to keep network traffic informationrelated to group-to-group communications bounded.

Thus, methods and systems to facilitate centralized network monitoringfor distributed networks have been described. Although these methods andsystems were discussed with reference to particular embodiments of thepresent invention, such embodiments should not be read as unnecessarilylimiting the scope of the invention. Instead, the invention should onlybe measured in terms of the claims, which follow.

1. A network monitoring system comprising: a plurality of networkmonitoring devices that monitor network traffic data from a plurality ofnodes of a network, each network monitoring device being configured tocollect network traffic data from an assigned subset of the nodes in thenetwork, and a central network monitoring device that is configured toreceive at least a portion of the network traffic data collected by thenetwork monitoring devices; wherein at least one of the networkmonitoring devices is configured to select fewer nodes than its assignedsubset of nodes for collecting network traffic data, based on a capacityof the network monitoring device and a priority associated with eachnode of its assigned subset of nodes.
 2. The network monitoring systemof claim 1, wherein the priority associated with at least one of thenodes is based on a number of network monitoring devices that providenetwork traffic data associated with this node to the central networkmonitoring device.
 3. The network monitoring system of claim 1, whereineach subset of assigned nodes includes a root node, each of the nodes ofthe subset being hierarchically related to the root node, and thepriority associated with each node is based on a hierarchical distanceof the node from the root node.
 4. The network monitoring system ofclaim 1, wherein each subset of assigned nodes includes leaf nodes andbranch nodes arranged in a hierarchy, and the priority associated witheach branch node is based on a hierarchical distance of the branch nodefrom a hierarchically-closest leaf node.
 5. The network monitoringsystem of claim 1, wherein the priority associated with each linkbetween nodes is dependent upon a number of network monitoring devicesthat provide network traffic data associated with each node of the linkto the central network monitoring device.
 6. The network monitoringsystem of claim 1, wherein the central network monitoring devicecontrols the portion of the network traffic data received from thenetwork monitoring devices based on a capacity of the central networkmonitoring device and a priority associated with links between the nodesin the network.
 7. The network monitoring system of claim 1, whereineach subset of assigned nodes includes a root node, each of the nodes ofthe subset being hierarchically related to the root node, and thepriority associated with each link is based on a hierarchical distanceof each node of the link from its root node.
 8. The network monitoringsystem of claim 1, wherein each subset of assigned nodes includes leafnodes and branch nodes arranged in a hierarchy, and the priorityassociated with each link is based on a hierarchical distance of eachnode of the link from a hierarchically-closest leaf node, thehierarchical distance of a leaf node from a hierarchically-closest leafnode being zero.
 9. A method comprising: assigning, via a centralmonitoring device, a subset of nodes in a network to each of a pluralityof network monitoring devices that are configured to monitor networktraffic data of the assigned subset of nodes, selecting, at at least onenetwork monitoring device, fewer nodes to monitor than its assignedsubset of nodes, based on a capacity of the at least one networkmonitoring device and a priority associated with each node of itsassigned subset of nodes, and collecting network traffic data from theselected fewer nodes, receiving, at the central monitoring device, atleast a portion of the network traffic data collected by the pluralityof network monitoring devices, and reporting, by the central monitoringdevice, one or more statistics based on the received network trafficdata.
 10. The method of claim 9, wherein the priority associated with atleast one of the nodes is based on a number of network monitoringdevices that provide network traffic data associated with this node tothe central network monitoring device.
 11. The method of claim 9,wherein each subset of assigned nodes includes a root node, each of thenodes of the subset being hierarchically related to the root node, andthe priority associated with each node is based on a hierarchicaldistance of the node from the root node.
 12. The method of claim 9,wherein each subset of assigned nodes includes leaf nodes and branchnodes arranged in a hierarchy, and the priority associated with eachbranch node is based on a hierarchical distance of the branch node froma hierarchically-closest leaf node.
 13. The method of claim 9, includingselecting, by the central network monitoring device, the portion of thenetwork traffic data to be received from the network monitoring devicesbased on a capacity of the central network monitoring device and apriority associated with links between the subsets of nodes in thenetwork.
 14. The method of claim 13, wherein the priority associatedwith each link between subsets is dependent upon a number of networkmonitoring devices that provide network traffic data associated witheach node of the link to the central network monitoring device.
 15. Themethod of claim 13, wherein each subset of assigned nodes includes aroot node, each of the nodes of the subset being hierarchically relatedto the root node, and the priority associated with each link is based ona hierarchical distance of each node of the link from its root node. 16.The method of claim 13, wherein each subset of assigned nodes includesleaf nodes and branch nodes arranged in a hierarchy, and the priorityassociated with each link is based on a hierarchical distance of eachnode of the link from a hierarchically-closest leaf node, thehierarchical distance of a leaf node from a hierarchically-closest leafnode being zero.
 17. A non-transitory computer readable medium thatincludes a computer program that, when executed at a network monitoringdevice, causes the device to: receive an assignment of a subset of nodesin a network to monitor for network traffic data, select fewer nodes tomonitor than the assigned subset of nodes, based on a capacity of thenetwork monitoring device and a priority associated with each node ofits assigned subset of nodes, collect network traffic data from theselected fewer nodes, and communicate at least a portion of thecollected network traffic data to a central monitoring device.
 18. Themedium of claim 17, wherein the priority associated with at least one ofthe nodes is based on a number of other network monitoring devices thatprovide network traffic data associated with this node to the centralnetwork monitoring device.
 19. The medium of claim 17, wherein thesubset of assigned nodes includes a root node, each of the nodes of thesubset being hierarchically related to the root node, and the priorityassociated with each node is based on a hierarchical distance of thenode from the root node.
 20. The medium of claim 17, wherein the subsetof assigned nodes includes leaf nodes and branch nodes arranged in ahierarchy, and the priority associated with each branch node is based ona hierarchical distance of the branch node from a hierarchically-closestleaf node.
 21. A non-transitory computer readable medium that includes acomputer program that, when executed at a central monitoring device,causes the device to: assign a subset of nodes of a network to each of aplurality of network monitoring devices, each network monitoring devicebeing configured to collect network traffic data from the subset ofnodes, and receive a portion of the network traffic data from thenetwork monitoring devices based on a capacity of the central networkmonitoring device and a priority associated with links between thesubsets of nodes.
 22. The medium of claim 21, wherein the priorityassociated with each link between subsets is dependent upon a number ofnetwork monitoring devices that provide network traffic data associatedwith each node of the subset to the central network monitoring device.23. The medium of claim 21, wherein each subset of assigned nodesincludes a root node, each of the nodes of the subset beinghierarchically related to the root node, and the priority associatedwith each link is based on a hierarchical distance of each node of thelink from its root node.
 24. The medium of claim 21, wherein each subsetof assigned nodes includes leaf nodes and branch nodes arranged in ahierarchy, and the priority associated with each link is based on ahierarchical distance of each node of the link from ahierarchically-closest leaf node, the hierarchical distance of a leafnode from a hierarchically-closest leaf node being zero.